class: center, middle, inverse, title-slide .title[ # Statistical Concepts Everyone Should Know ] .subtitle[ ##
Statistics for Life
] .author[ ###
John Slough
for
The John & Calvin Podcast
] --- class: inverse, center, large <h1 style="font-size: 80px; margin-top: 150px;">Beware the Metrics</h1> <hr style="margin-top: 2em; margin-bottom: 2em;"> -- <h2 style="font-size: 50px;">Misuse and Misunderstanding, Not Math, Mislead</h2> -- <h2 style="font-size: 40px;">mean ≠median</h2> --- ## 1. Mean vs Median ### Local bar weath distribution: 50 Patrons
--- ## 1. Mean vs Median ### Local bar weath distribution: 50 Patrons + Billon Gezos
--- ## 1. Mean vs Median <div style="font-size: 130%; text-align: left; margin: 100px 0;"> <p><strong>Mean</strong>: Sum of all values divided by the number of values.</p> <br> <p><strong>Median</strong>: Middle value when the values are ordered.</p> </div> --- ## 1. Mean vs Median ### Impact of a $150B Outlier on Mean vs. Median — Across Sample Sizes <table class="table" style="font-size: 24px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> Sample Size </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Median </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 101 </td> <td style="text-align:right;padding-left: 20px;"> $1,485,214,194 </td> <td style="text-align:right;padding-left: 20px;"> $61,587 </td> </tr> <tr> <td style="text-align:right;"> 1,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 149,915,313 </td> <td style="text-align:right;padding-left: 20px;"> $60,141 </td> </tr> <tr> <td style="text-align:right;"> 10,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 15,063,285 </td> <td style="text-align:right;padding-left: 20px;"> $59,610 </td> </tr> <tr> <td style="text-align:right;"> 100,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 1,564,864 </td> <td style="text-align:right;padding-left: 20px;"> $59,897 </td> </tr> <tr> <td style="text-align:right;"> 1,000,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 214,846 </td> <td style="text-align:right;padding-left: 20px;"> $59,868 </td> </tr> <tr> <td style="text-align:right;"> 10,000,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 79,864 </td> <td style="text-align:right;padding-left: 20px;"> $59,885 </td> </tr> <tr> <td style="text-align:right;"> 100,000,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 66,365 </td> <td style="text-align:right;padding-left: 20px;"> $59,876 </td> </tr> </tbody> </table> --- ## 1. Mean vs Median A more realistic example  --- .pull-left[ ### Economics & Wealth - GDP per capita - household income - net worth - home price - monthly rent - CEO compensation ### Health & Healthcare - life expectancy - healthcare spending per person - hospital bill - patient out-of-pocket cost ] .pull-right[ ### Academics - test score (SAT, PISA) - GPA - academic citations - speaking invitations per expert ### Other - commute time - screen-time per user - household energy use - carbon emissions per capita - YouTube ad revenue per channel - revenue per app in the App Store - Software Bug Fix Times ] --- ## 1. Mean vs Median ### Pareto Distribution
--- ## 1. Mean vs Median ### Pareto Distribution (x-axis limited)
--- ## 1. Mean vs Median ### Normal Distribution
--- ## 1. Mean vs Median <br> ## Extreme Outliers <br> -- ## Symmetric vs Skewed <br> -- ## the Shape of the Data --- class: inverse, center, large <h1 style="font-size: 80px; margin-top: 200px;">Distributions</h1> <hr style="margin-top: 2em; margin-bottom: 2em;"> -- <h2 style="font-size: 60px;">The Foundation Beneath Every Statistic</h2> --- ## 2. Distributions ### Theoretical models of where values are likely to occur - **Symmetry vs. Skewness** Are values evenly spread around the center, or do they stretch more in one direction? - **Variability (Spread)** How tightly or widely are the values clustered? - **Tail Behavior** How likely are extreme or outlier values? ---  ---  --- ## 2. Distributions $$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{ -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 } $$ Where: - `\(\mu\)` = the **mean** (center) - `\(\sigma\)` = the **standard deviation** (spread) - `\(x\)` = the value at which we evaluate the density - `\(f(x)\)` = the probability density at `\(x\)` given `\(\mu\)` and `\(\sigma\)` --- .center[ **Normal distribution** ] $$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{ -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 } $$ <hr style="border: 0; height: 1px; background: lightgray;"> <div style="text-align: center;"> <img src="assets/normal_annotated.png"> </div> --- ## 2. Distributions - Outcomes fall across possibilities - Those possibilities form a distribution - Identifying the distribution can shape your analysis --- class: inverse, center, large <h1 style="font-size: 80px; margin-top: 150px;">Correlation, Confounding, Causation</h1> <hr style="margin-top: 2em; margin-bottom: 2em;"> -- <h2 style="font-size: 80px;">and the Stories We Tell</h2> --- class: inverse, center, large, middle <h1 style="font-size: 80px; ">Correlation</h1> ---  ---  --- class: inverse, center, large, middle <h1 style="font-size: 80px; ">Confounding</h1> --- .pull-left[
- Model: `reading ~ shoe size` - Model p-value (shoe): < 0.001 ] --- .pull-left[
- Model: `reading ~ shoe size` - Model p-value (shoe): < 0.001 ] .pull-right[
- Model: `reading ~ age` - Model p-value (age): < 0.001 ] --- .pull-left[
- Model: `reading ~ shoe size` - Model p-value (shoe): < 0.001 ] .pull-right[
- Model: `reading ~ age` - Model p-value (age): < 0.001 ] <hr style="border: 0; height: 1px; background: lightgray;"> .center[ Combined Model: `reading ~ shoe + age` Shoe p-value: 0.314; Age p-value: **< 0.001** > **Result:** Once age is included, shoe size is no longer significant ] --- class: inverse, center, large, middle <h1 style="font-size: 80px; ">Causation</h1> --- ### How do we test if A causes B? <hr style="border: 0; height: 1px; background: lightgray;"> -- .pull-left[ <div style="margin-top: 4px;"> <b>Randomized Controlled Trial</b> <ul style="font-size: 80%;"> <li><b>Randomization:</b> Assign by chance to treatment or control</li> <br> <li><b>Control:</b> Keep other factors balanced across groups (via randomization)</li> <br> <li><b>Treatment:</b> Apply the intervention</li> <br> <li><b>Comparison:</b> Check for outcome differences between groups</li> </ul> </div> ] .pull-right[
] <br> > Experiments test causality by manipulating one factor, > using randomization to balance everything else, > and comparing outcomes. --- class: inverse, center, large, middle <h1 style="font-size: 80px; ">The Stories We Tell</h1> ---
---
<small style="font-size:12px; line-height:1.1; display:inline-block;"> "Figure 2 correlates saturated fat and total vegetable oil consumption versus heart disease deaths in the U.S.A., with data on all three dating back to at least 1909."</small> <small style="font-size:12px; line-height:1; display:inline-block;"> Data: <a href="https://www.fns.usda.gov/cnpp/us-foodsupply/nutrient-content-1909-2010">USDA Food Supply</a> | Paper: <a href="https://www.sciencedirect.com/science/article/abs/pii/S0306987717305017?via%3Dihub">ScienceDirect</a> | Video: <a href="https://www.youtube.com/watch?v=7kGnfXXIKZM">YouTube</a> </small> --- class: inverse, center, large <h1 style="font-size: 80px; margin-top: 200px;">Shapes of Change</h1> <hr style="margin-top: 2em; margin-bottom: 2em;"> -- <h2 style="font-size: 60px;">What Curve, What Point?</h2> --- ## 4. Shapes of Change
--- ## 5. Bias **Bias** is the difference between the expected value of an estimator and the true value of the parameter it estimates. Formally: $$ \text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta $$ Where: - `\(\hat{\theta}\)` = the **estimator** (your calculated estimate) - `\(\theta\)` = the **true parameter** (the real value you want) - `\(\mathbb{E}[\hat{\theta}]\)` = the **expected value** of the estimator (its long-run average over many samples) - If Bias = 0 → the estimator is **unbiased**. - If Bias ≠0 → the estimator is **biased** (systematically too high or too low). --- Bias doesn’t just shift your first guess. It filters what you see next — making you even more biased over time Selection bias: The sample is not representative of the population. Omitted variable bias: Leaving out a variable that influences both the dependent and independent variables. Measurement bias (or information bias): Errors in how data is collected or recorded. Survivorship bias: Only analyzing "survivors" or those who remain, ignoring those who dropped out or failed. Recall bias: Errors because people remember things inaccurately (common in surveys and retrospective studies). Observer bias: Researcher's expectations subtly influence measurements or observations. Publication bias: Studies with "positive" results are more likely to be published than "null" or "negative" results. --- ## 5. Bias The Self-Reinforcing Feedback Loop of Bias
--- ### Consequence of the Self-Reinforcing Feedback Loop of Bias Conceptual Edition
--- ### Consequence of the Self-Reinforcing Feedback Loop of Bias Carnivore Edition
--- ### Consequence of the Self-Reinforcing Feedback Loop of Bias Vegan Edition